Unsupervised Cross-Domain Word Representation Learning
نویسندگان
چکیده
Meaning of a word varies from one domain to another. Despite this important domain dependence in word semantics, existing word representation learning methods are bound to a single domain. Given a pair of source-target domains, we propose an unsupervised method for learning domain-specific word representations that accurately capture the domainspecific aspects of word semantics. First, we select a subset of frequent words that occur in both domains as pivots. Next, we optimize an objective function that enforces two constraints: (a) for both source and target domain documents, pivots that appear in a document must accurately predict the co-occurring non-pivots, and (b) word representations learnt for pivots must be similar in the two domains. Moreover, we propose a method to perform domain adaptation using the learnt word representations. Our proposed method significantly outperforms competitive baselines including the state-of-theart domain-insensitive word representations, and reports best sentiment classification accuracies for all domain-pairs in a benchmark dataset.
منابع مشابه
Deep Unsupervised Domain Adaptation for Image Classification via Low Rank Representation Learning
Domain adaptation is a powerful technique given a wide amount of labeled data from similar attributes in different domains. In real-world applications, there is a huge number of data but almost more of them are unlabeled. It is effective in image classification where it is expensive and time-consuming to obtain adequate label data. We propose a novel method named DALRRL, which consists of deep ...
متن کاملFrom Bilingual Dictionaries to Interlingual Document Representations
Mapping documents into an interlingual representation can help bridge the language barrier of a cross-lingual corpus. Previous approaches use aligned documents as training data to learn an interlingual representation, making them sensitive to the domain of the training data. In this paper, we learn an interlingual representation in an unsupervised manner using only a bilingual dictionary. We fi...
متن کاملAcquiring Domain-Specific Dialog Information from Task-Oriented Human-Human Interaction through an Unsupervised Learning
We describe an approach for acquiring the domain-specific dialog knowledge required to configure a task-oriented dialog system that uses human-human interaction data. The key aspects of this problem are the design of a dialog information representation and a learning approach that supports capture of domain information from in-domain dialogs. To represent a dialog for a learning purpose, we bas...
متن کاملThe Benefits of Word Embeddings Features for Active Learning in Clinical Information Extraction
This study investigates the use of unsupervised word embeddings and sequence features for sample representation in an active learning framework built to extract clinical concepts from clinical free text. The objective is to further reduce the manual annotation effort while achieving higher effectiveness compared to a set of baseline features. Unsupervised features are derived from skip-gram wor...
متن کاملImegrating Domain and Paradigmatic Similarity for unsupervised Sense Tagging
An unsupervised methodology for Word Sense Disambiguation, called Dynamic Domain Sense Tagging, is presented. It relies on the convergence of two very well known unsupervised approaches (i.e. Domain Driven Disambiguation and Conceptual Density). For each target word a domain is dynamically modeled by expanding the its topical context, i.e. a set of words evoking the underlying/implict domain wh...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2015